SALSA: Sequence ALignment via Steiner Ancestors

نویسنده

Giuseppe Lancia

چکیده

We describe SALSA (Sequence ALignment via Steiner Ancestors), a public{domain suite of programs for generating multiple alignments of a set of genomic sequences. We allow the use of either of the two popular objectives, Tree Alignment or Sum-of-Pairs. The main distinguishing feature of our method is that the alignment is obtained via a tree in which the internal nodes (ancestors) are labeled by Steiner sequences for triples of the input sequences. Given lists of candidate labels for the ancestral sequences, we use dynamic programming to choose an optimal labeling under either objective functions. Finally, the fully labeled tree of sequences is turned into into a multiple alignment. Enhancements in our implementation include the traditional space-saving ideas of Hirschberg as well as new data-packing techniques. The running-time bottleneck of computing exact Steiner sequences is handled by a highly eeective but much faster heuristic alternative. Finally, other modules in the suite allow automatic generation of linear-program input les that can be used to compute novel lower bounds on the optimal values. We also report on some preliminary computational experiments with SALSA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments

MOTIVATION Optimal sequence alignment based on the Smith-Waterman algorithm is usually too computationally demanding to be practical for searching large sequence databases. Heuristic programs like FASTA and BLAST have been developed which run much faster, but at the expense of sensitivity. RESULTS In an effort to approximate the sensitivity of an optimal alignment algorithm, a new algorithm h...

متن کامل

Steiner Points in the Space of Genome Rearrangements

We present some experiences with the problem of multiple genome comparison , analogous to multiple sequence alignment in sequence comparison, under the inversion and transposition distance metrics, given a xed phylogeny. We rst describe a heuristic for the case in which phylogeny is a star on three vertices and then use this to approximate the multiple genome comparison problem via local search.

متن کامل

Attacking Generalized Tree Alignment

Many multiple alignment methods implicitly or explicitly try to minimize the amount of biological change implied by an alignment. At the level of sequences, biological change is measured along a phylogenetic tree, a structure frequently being predicted only after the multiple alignment instead of together with it. The Generalized Tree Alignment problem addresses both questions simultaneously. I...

متن کامل

Regular Language Constrained Sequence Alignment Revisited

Imposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, the Regular Expression Constrained Sequence Alignment Problem was introduced, which proposed an O(n²t⁴) time and O(n²t²) space algorithm for solving it, where n is the length of the input strings...

متن کامل

Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search.

A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unli...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

SALSA: Sequence ALignment via Steiner Ancestors

نویسنده

چکیده

منابع مشابه

SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments

Steiner Points in the Space of Genome Rearrangements

Attacking Generalized Tree Alignment

Regular Language Constrained Sequence Alignment Revisited

Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search.

عنوان ژورنال:

اشتراک گذاری